Improving minority class prediction using cost-sensitive ensembles
نویسندگان
چکیده
In this paper, we address the problem of dealing with unbalanced datasets in the context of classification, i.e. where some of the classes contain significantly more objects than the other(s). We show that we can this problem by choosing classifiers for a committee of multiple classifier systems. In particular, we propose to design such an ensemble on the basis of a cost of elementary classifiers, given by a cost matrix. To assure the diversity of the ensemble each of the base classifiers is trained on a random subspace. This allows to improve the recognition rate of the minority class, which is typically low when using canonical classifiers. We evaluated our proposed algorithm on a variety of benchmark datasets and show that it significantly outperforms the base cost-sensitive classifier and its boosted version. The results confirm that our approach is a useful tool for dealing with unbalanced datasets.
منابع مشابه
Measuring Accuracy between Ensemble Methods: AdaBoost.NC vs. SMOTE.ENN
The imbalanced class distribution is one of the main issue in data mining. This problem exists in multi class imbalance, when samples containing in one class are greater or lower than that of other classes. Most existing imbalance learning techniques are only designed and tested for two-class scenarios. The new negative correlation learning (NCL) algorithm for classification ensembles, called A...
متن کاملMaking Accurate Credit Risk Predictions with Cost-Sensitive MLP Neural Networks
In practical applications to credit risk evaluation, most prediction models often make inaccurate decisions because of the lack of sufficient default data. The challenging issue of highly skewed class distribution between defaulter and nondefaulters is here faced by means of an algorithmic solution based on cost-sensitive learning. The present study is conducted on the popular Multilayer Percep...
متن کاملProposing a Novel Cost Sensitive Imbalanced Classification Method based on Hybrid of New Fuzzy Cost Assigning Approaches, Fuzzy Clustering and Evolutionary Algorithms
In this paper, a new hybrid methodology is introduced to design a cost-sensitive fuzzy rule-based classification system. A novel cost metric is proposed based on the combination of three different concepts: Entropy, Gini index and DKM criterion. In order to calculate the effective cost of patterns, a hybrid of fuzzy c-means clustering and particle swarm optimization algorithm is utilized. This ...
متن کاملUsing Random Forest to Learn Imbalanced Data
In this paper we propose two ways to deal with the imbalanced data classification problem using random forest. One is based on cost sensitive learning, and the other is based on a sampling technique. Performance metrics such as precision and recall, false positive rate and false negative rate, F-measure and weighted accuracy are computed. Both methods are shown to improve the prediction accurac...
متن کاملActively Balanced Bagging for Imbalanced Data
Under-sampling extensions of bagging are currently the most accurate ensembles specialized for class imbalanced data. Nevertheless, since improvements of recognition of the minority class, in this type of ensembles, are usually associated with a decrease of recognition of majority classes, we introduce a new, two phase, ensemble called Actively Balanced Bagging. The proposal is to first learn a...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2011